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Abstract: We discuss and review recent developments in the area of applied 
algebraic topology, such as persistent homology and barcodes. In particular, we 
discuss how these are related to understanding more about manifold learning 
from random point cloud data, the algebraic structure of simplicial complexes 
determined by random vertices and, in most detail, the algebraic topology of 
the excursion sets of random fields. 

1. Introduction 

Over the last few years there has been a very interesting and rather exciting devel- 
opment in what is reputedly one of the most esoteric areas of pure mathematics: 
algebraic topology. Some of the practitioners of this subject are, to a considerable 
extent, looking out beyond the inner beauty of their subject and seeing if they can 
apply it to problems in the 'real world', that is to problems outside the realm of 
pure mathematics. As a result 'applied algebraic topology' is no longer an oxy- 
moron, and although it is true that at this point sophisticated applications are still 
few and far between, there is a growing feeling that the gap between theory and 
practice is closing. We shall give more specific references below, but a very lively 
discussion of this trend can be found in Rob Christ's review [23], book in progress 
[24] and website on a project on sensor topology for minimal planning [25]. Gunnar 
Carlsson's webpage [11], which describes a large Stanford TDA (topological data 
analyis) project, and a DARPA webpage [16] describing a broad based project, 
also help explain the reasons why so many people have been so attracted to this 
direction. 

These ideas are not totally new. For example, the brain imaging community has 
been using random field modelling and topological properties of these fields for 
quite some time. For example, people like Karl Friston, a leading figure in medical 
imaging, have been talking about the notion of 'topological inference' for a while (cf. 
the website [22]) based in a large part on the work of the late Keith Worsley. What 
is new, however, is the coordinated attack of a goodly number of high powered 
mathematicians on applications. 

The aim of the current paper is to describe some of the new ideas that have 
arisen in applied algebraic topology and, given the interests of the authors, exploit 
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some of them in the setting of random fields, i.e. of random processes defined over 
spaces of dimension greater than one. There are new results here, albeit without 
proofs. However, this paper is mainly review and exposition, with a strong bias in 
a particular direction, but written in a language which we hope will be accessible 
to the natural readers of this Festchrift who may (as did we until recently) find the 
language of even applied topologists somewhat unfamiliar. In the final analysis, if 
Larry will be happy with the final product, then we shall be happy as well. 

The paper starts in Section 2 with a discussion of one of the central notions of ap- 
plied algebraic topology, that of persistent homology and its graphical depiction via 
barcodes. This is done via examples rather than formal definitions, so it should be 
possible to understand the notion of persistent homology without actually knowing 
what a homology group is. (For those who do know about homology, more precise 
definitions of persistent homology and barcodes are given in the appendix of Sec- 
tion 6.) Also in Section 2 we discuss simplicial complexes as they arise in manifold 
learning and also discuss the topology of random field excursion sets. 

Section 3 has a brief discussion of persistence diagrams of excursion sets, based 
on simulations. (These have actually already been used elsewhere for the analysis of 
brain imaging data, see [15].) These data raise numerous challenges for statisticians 
and probabilists. 

Section 4 introduces what seems at first to be a rather abstruse structure of 'Euler 
integration', but it is very quickly shown that not only is this a useful concept, but 
the key to solving a number of quite varied problems. This is the main section of 
the paper. 

A very brief Section 5 points out that we should have also had more to say 
about random simplicial complexes, but didn't, and so points you to appropriate 
references. A brief technical appendix completes the paper in Section 6. 

2. Persistent homology and barcodes 

In this section we are going to give a very brief and sketchy introduction to some 
basic notions of algebraic topology. A concise, yet very clear introduction to the 
topics that concern us can be found in [9, 24], while [26, 37] are good examples of a 
thorough coverage of homology theory. Recent excellent and quite different reviews 
by Carlsson [10, 13], Edelsbrunner and Harer [20], and Ghrist [1, 17, 18, 23] give a 
broad exposition of the basics of persistent homology. 

Algebraic topology focuses on studying topology by assigning algebraic, group 
theoretic, structures to topological spaces X. Thus, homology, cohomology and ho- 
motopy groups can be used to classify objects into classes of 'similar shape'. In this 
paper we shall focus on homology. If X is of dimension N, then it has N+l homology 
groups, each one of which is an abelian group. (We shall later take the coefficients 
from Z 2 , thereby making the groups vector spaces.) The zero-th homology H (X) 
is generated by elements that represent connected components of X. For k > 1 the 
A:-th homology group Hk{X) is generated by elements representing /c-dimensional 
'loops' in X. The rank of Hk(X), denoted by (3k, is called the fc-th Betti number. 
For X compact and k > 1, measures the number of fc-dimensional holes in X, 
while Pq counts the number of connected components. The Euler characteristic, a 
central topological quantity and homotopy invariant, is then 



(2.1) 
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To explain the idea of persistent homology, we shall work with two examples. 
The first is based on what is known as the 'Morse filtration' of excursion sets, the 
second on complexes formed from point sets. 

2.1. Barcodes of excursion sets 

Suppose that M is a nice space, that / : M M is smooth, and consider the 
excursion, or super-level, sets 



Note that if u > v then A u C A v . Going from utov, components of A u may merge 
and new components may be born and possibly later merge with one another or with 
the components of A u . Similarly, the topology of these components may change, as 
holes and other structures form and disappear. Following the topology of these sets, 
as a function of u, by following their homology, is an example of persistent homology. 
The term 'persistence' comes from the fact that as the level u changes there is no 
change in homology until reaching a level u which is a critical point of /; i.e. the 
topology of the excursion sets remains static, or 'persists', between the heights of 
critical points. This, of course, is the basic observation of Morse Theory, which links 
critical points to homology. However, the persistence of persistent homology goes 
further. For example, when two components merge, one treats the first of these to 
have appeared as if it is continuing its existence beyond the merge level. 

A useful way to describe persistent homology is via the notion of barcodes. 
Assuming that dim(M) = N, we also have, from the smoothness of /, that, if A u 
is non-empty, then dim(.A u ) will typically also be N. A barcode for the excursion 
sets of / is then a collection of N + 1 graphs, one for each collection of homology 
groups of common order. A bar in the fc-th graph, starting at u\ and ending at 
[u\ > u 2 ) indicates the existence of a generator of Hk(A u ) that appeared at level 
U\ and disappeared at level u 2 - An example is given in Figure 2.1, in which the 
function / is actually the realisation of a smooth random field on the unit square, 
an example to which we shall return later. 

Figure 2.2 is even more impressive, since it shows a three dimensional example. 
Note that, as opposed to the 2-dimensional case, it is almost impossible to say 
anything about topology just by looking at the boxes with the excursion sets at 
the top of the figure, but there is a lot of immediate visual information available in 
the barcodes. This phenomenon becomes even more marked as the dimension N of 
the parameter space increases. While it may be impossible to imagine what a five 
dimensional excursion set looks like, it is easy to look at a barcode with six sets of 
bars for the six persistent homologies. 

2.2. Point sets and manifold learning 

Consider the following situation. Let X be an unknown subset of E d with finite 
Lebesgue measure and let X\ , . . . , X n be n independent random samples uniformly 
distributed on X. We would like to study the homology of X using only these 
random points. When A is a manifold, this is typically referred to as manifold 
learning. In many cases we can find an e for which the union of balls 



(2.2) 



A u ={peM: /(p)G[u,oo)} = /"H^oo)). 



n 



(2.3) 
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Fig 2.1. Barcodes for the excursion sets of a function on [0,1] 2 . The top seven boxes show the 
surfaces generated by a 2- dimensional random field above excursion sets A u for different levels 
u. To determine the level for each figure, follow the vertical line down to the scale at the bottom 
of the barcode. As the vertical lines pass through the boxes labelled Ho and Hi, the number of 
intersections with bars in the Ho (Hi) box gives the number of connected components (resp. holes) 
in A u . Thus, at u ~ 1.9, A u has 4 connected components but no holes, while at u ~ —1-2, A u 
has only 1 connected component, but 9 holes. The horizontal lengths of the bars indicate how 
long the different topological structures (generators of the homology groups) persist. Computation 
of the barcodes was carried out in Matlab using Plex (Persistent Homology Computations) from 
Stanford [12]. 



is homotopy equivalent to X (and hence has the same homology). However, we do 
not know, a priori, what is the correct choice of e. An example is given in Figure 
2.3, in which X is a two-dimensional annulus. If e is chosen to be too small then 
U is homotopy equivalent to the union of n distinct points (and hence contains 
no information on X). On the other hand, choosing e to be too big gives us a U 
that is a large, contractible blob, which again tells us nothing about X. But, as 
with Goldilock's porridge, choosing e 'just right', recovers an object topologically 
equivalent to the annulus. Persistent homology overcomes this sensitivity to the 
choice of e by considering a range of possible values of e, much as we did with the 
levels of excursion sets in the previous example, but with the aim of learning about 
the topology of X from the barcodes. The key assumption is that homology elements 
that 'live longer' (or, persist) are more likely to represent homology elements of X, 
whereas the shorter ones are just 'noise'. 

To describe this in a little more detail we need the notion of simplicial complexes. 
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Fig 2.2. Barcodes for the excursion sets of a 3-dimensional random field. The barcode diagram 
is to be read as for Figure 2.1, with two differences: The top 7 boxes now display the excursion 
sets themselves and the values of the field are colour coded. Furthermore, there are now three 
homology-groups/barcode-boxes, representing connected components, handles, and holes. 




Fig 2.3. Trying to capture the homology of an annulus (where /3o = 1, Pi = 1) from a union 
of balls of various radii around a random sample of points from the annulus. A good choice of 
radius recovers the correct homology in the first case. If the radius chosen is too small, the union 
of balls has the same homology as n distinct points (7?o = n, /?i = 0). If the radius chosen is too 
big, the union is contractible (/3q = l,/3i = 0). 



2.3. Simplicial complexes 

We are not going to give a definition of simplicial complexes here, but rather shall 
describe two classic ways to construct abstract simplicial complexes from a given 
set of points in a metric space. 

Definition 2.1 (The Cech Complex). LetV — {xi,x%, • ■ •} be a collection of points 
in a metric space X. Construct an abstract simplicial complex C(V,s) in the fol- 
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lowing way: 

1. The O-simplices are the points in V , 

2. An n-simplex [xi , . . . ,Xi n ] isinC(V,e) if f]^=o B e {xi k ) ^ 0, 

where B s (x) is the ball of radius e around x. The complex C(V,e) is called the Cech 
complex attached to V and e. 

Definition 2.2 (The Vietoris-Rips Complex). Let V = {x\, X2, ■ ■ ■} a collection of 
points in a metric space X. Construct an abstract simplicial complex R(V,s) in the 
following way: 

1. The O-simplices are the points in V '. 

2. An n-simplex [xi , . . . ,x in ] is in R(V,e) if B e (x ik ) n B e (x im ) ^ for every 
< k < m < n. 

The complex R(V,£) is called the Rips complex attached to V and e. 

From these definitions it is obvious that C(V,e) C R(V,e). In addition, it is 
proved in [17] that R{V,e') C C(V,e) for e/e' > ^/2d/{d+ 1). In other words, a 
Cech complex can be 'approximated' by Rips complexes. This fact is used in com- 
putational applications, since working with Rips complexes is much more efficient 
than with Cech complexes. 

There are occasions when Rips and Cech complexes coincide, as is the case when 
X is Euclidean but the metric is the L°° rather than the more standard L 2 norm. 
In many statistical applications the choice of metric on X may be dictated by 
optimality considerations rather than 'natural' geometry. 

The main importance of the Cech complex and its relevance to homology theory 
is given in the next theorem. 

Theorem 2.3 (The Nerve Theorem). Suppose that the intersections D-ceP' B £ {x) 
are either empty or contractible for any subset V' of V . Then the Cech complex 
C(V,e) is homotopy equivalent to Uze-p B e (x). In particular, if X is a finite di- 
mensional normed linear space, or a compact Riemannian manifold with convexity 
radius greater than e, and if {B £ (x)} xeV is a cover of the space X, then C(V,e) is 
homotopy equivalent to X. 

The main consequence of the Nerve Theorem is that in order to study the ho- 
mology of the topological space U x€ -p B e (x), we can study the homology of the 
combinatorial space C{y,e). This fact can be useful in proving theoretical results, 
but its main contribution is to computational applications. 

With these definitions behind us, Figure 2.4 gives a nice example of how barcodes 
describe the topology of an annulus (f3o = 1, (3\ = 1, ii-2 = 0) in K 2 , when 17 points 
are sampled from it and Rips complexes are computed for a range of e. 

3. Random field simulations 

In this section we want to consider the persistent homology of random field excur- 
sion sets. In particular, we would like to understand something about the distribu- 
tional properties of their barcodes. 

The random fields behind the barcodes of Figures 2.1 and 2.2 were taken to be 
mean zero, Gaussian, over the parameter set [0,1] 2 and with covariance function 
R(p) = exp(— a||p|| 2 ). This is a stationary, isotropic, and infinitely differentiable 
random field, and the starting test case for all theories. We took a = 100. 
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Fig 2.4. The barcode of a Rips complex, taken from [23]. The points were sampled from an annulus 
in M 2 . We see that there is a single Hq bar that persists forever. This bar represents the single 
connected component of the annulus. In Hi we see a couple of dominant bars indicating that the 
sample space contains holes. The longest bar actually represents the real hole of the annulus. In 
H2 there is nothing significant and indeed 02 = in this case. 



We ran 10,000 simulations of this field, calculating 10,000 barcodes. In order to 
represent the data in a reasonable fashion, we used persistence diagrams rather than 
barcodes. To form a persistence diagram from the bars in Hk, one simply replaces 
each bar by a pair (x,y), where x is the level at which the bar begins and y the 
level at which it ends. Thus x > y and the pair (x, y) lies in a half plane. In Figure 
3.1 the corresponding persistence diagrams for the complete simulation data are 
shown for Hq and H\. 




-4-2 2 4 6 -5 -4 -3 -2 -1 1 2 3 



Fig 3.1. Persistence diagrams for 10,000 simulations of an isotropic random field on the unit 
square. Note that the diagrams for Ho and Hi seem quite different. 

Additional information on the barcodes is given in Figure 3.2. What is shown 
there are the (marginal) distributions of the start and end points of the barcodes 
for Hq and Hi from the same simulation. A simple application of Morse theory, or, 
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in this simple two dimensional setting, a little thought, leads to the realisation that 
the start points of the Ho bars are all heights of local maxima of the field, while 
the end points of the Hi bars correspond to local minima. These distributions have 
been well studied (although their precise form is not known) in the general theory of 
Gaussian random fields. The remaining start and end points correspond to different 
types of saddle points of the random field. However, what differentiates between 
the end point of a Ho bar and the start point of a Hi bar is global geometry and 
is not determined by the local behaviour of the field. 



Dim bars start point histogram (infinite bars included) Dim bars end point histogram (infinite bars included) 




■6 -4 -2 2 4 6 -6 -4 -2 2 4 6 



Dim 1 bars start point histogram (infinite bars included) Dim 1 bars end point histogram (infinite bars excluded) 




Fig 3.2. Empirical distributions of start and end points of bars for the Gaussian field of Figure 
3.1. 

It would be interesting to know more about the real distributions lying behind 
Figures 3.1 and 3.2, but at this point we know very little. An interesting aspect is 
the asymmetries between the start points of the Ho bars (local maxima) and the 
end points of the Hi bars (local minima, as well as between the two sets of saddle 
points. We imagine that this is due to boundary effects and would disappear if the 
simulation had been carried out on a closed manifold. 

There are some things that we do know, however, and we turn to them next. 
Firstly, however, we need to make a small digression. 

4. Euler Integration 

Before we introduce the Euler integral, we need to define the Euler characteristic for 
noncompact spaces. For a compact space X, we already defined the Euler charac- 
teristic x(^0 m (2-1) as an alternating sum of Betti numbers; viz. as an alternating 
sum of ranks of the homology groups H^(X). 

In this setting the Euler characteristic is a homotopy invariant and is additive 
in the sense that 

(4.1) X (AUB)=x(A)+x(B)-x(AnB). 
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In extending the definition of the Euler characteristic to noncompact spaces, 
if one uses the definition of the Euler characteristic as the alternating sum of 
rankiffc(X), then additivity is lost (consider [0,1] = [0,1) U {1}). Therefore the 
definition of the Euler characteristic we shall use for noncompact spaces is 

X(X) = ^(-I) fc ranki4 f p0 = ^(-l) fc (# of open fc-simplices in X), 

fc fc 

where H¥(X) is called the locally finite homology (see [27, Chapter 3]). Since 
H]£(X) = Hk(X) for compact spaces, this extends the definition of the Euler char- 
acteristic to noncompact spaces such that additivity is preserved, but we lose 
homotopy invariance, although it is still a homeomorphism invariant. This Euler 
characteristic can also be computed by decomposing the space into a union of open 
fc-simplices and points. For example, x((0, 1)) = — 1, x([0, 1)) = and x([0> 1]) = 1- 



4-1- The Euler Integral 

Since the Euler characteristic is an additive operator on sets (cf. (4.1)), it is tempt- 
ing to consider x as a measure and integrate with respect to it. The main problem 
in doing so is that x is only finitely additive. 

At first (cf. [38]), integration with respect to the Euler characteristic was defined 
for a small set of functions called constructible functions defined by 



CF{X) = \ h(x) = J2^A k (x) 



k=l 



n e N, etfc £ Z, Ak is tame > , 



where 'tame' means having a finite Euler characteristic. For this set of functions 
we can define the Euler integral similarly to the Lebesgue integral. Let h(x) — 
Y,k=i a k^A k (x) and define 



~ n 

/ hdx = Vo«(4). 
Jx 



' x fc=i 

This integral has many nice properties, similarly to those of the Lebesgue inte- 
gral, such as linearity and a version of the Fubini theorem (cf. [24, 38]). However, 
due to the lack of countable additivity one cannot easily continue from here by 
approximation to integrate more general functions. 

Nevertheless, in [3] two possible extensions were suggested for the Euler integral 
of real valued functions. We shall not go into details of the constructions here, but 
rather use one of the properties of these extensions to define what in [3] was called 
an upper Euler integral and which we, for simplicity, shall call an Euler integral. 
Thus, we define the Euler integral by 



(4.2) / fd X = / [ X (f > u) - X (f < -u)} du, 

JX Ju=0 

where x(f > u ) — x(/ -1 ( w i°°)) an d x(f ^ ~ u ) — X (/ _1 [~ u i °°))- These inte- 
grals are defined for what are known as 'tame' functions. See [3] for details. 

Unfortunately, these extensions of the Euler integral have many flaws, of which 
the most prominent one is the lack of additivity. For example, a simple computation 
shows that for X = [0,1] 

/ xd X + I (l-x)d X = 1 + 1 = 2^ 1 = / ld X - 
Jx Jx Jx 
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Nevertheless, these integrals still have interesting properties and some intereting 
applications. Here is one. 

4-2. An application of the Euler integral 

This application was suggested in [4] . Suppose that an unknown number of targets 
are located in a space X and each target a is represented by its support U a C X. 
Suppose also that the space X is covered with sensors, each reporting only the 
number of targets it can sense, but with no ability to distinguish between targets. 
Let h : X — > Z be the sensor field, i.e. 



The following theorem states how to combine the readings from all the sensors and 
get the exact number of targets. 

Theorem 4.1 (Baryshnikov and Ghrist, [4]). If all the target supports U a satisfy 
x(U a ) = 7 for some 7^0, then 



Note that we do not need to assume anything about the targets other than they 
all have the same nonzero Euler characteristic . For example, we need not assume 
that they are all convex or even have the same number of connected components. 
On the other hand, the theorem assumes an ideal sensor field, in the sense that the 
entire (generally continuous) space X is covered with sensors which register only 
what happens at the point at which they are placed. In [3] more realizable models 
using the upper and lower Euler integrals are discussed. 

Assume now that the readings from the sensors are contaminated by a Gaussian 
(or Gaussian related) noise f(x). Under these conditions it can be proved that 



Denoting s = J x hdx (deterministic signal), n = f x fd\ (noise) and y = J x (h + 
f)d\ (measurement), this is a classic signal plus noise problem (i.e. y = s + n). 
In particular, in order to estimate s from y, it would be nice, in view of Theorem 
4.1, to be able to compute some distributional properties of the Euler integral of a 
Gaussian random field. We shall limit ourselves to computing the expectation and 
shall turn to this after a few words on the Gaussian kinematic formula. 

4-3. The Gaussian kinematic formula 

Suppose that M is an iV-dimensional, C 2 , Whitney stratified manifold satisfying 
some mild side conditions (cf. [2] for details) and D a similarly nice stratified sub- 
manifold of R k . Let / = (/ , . . . , f k ) : M — > K fe be a vector valued random process, 
the components of which are independent, identically distributed, real valued, C 2 , 
centered, unit variance, Gaussian processes. Using /, define a Riemannian metric 
on M by setting 



h(x) = # {targets activating the sensor located at x} . 



# {targets} 





(4.3) 



9x(X,Y) 



A 



E{(Xf x ) (Yf x )}, 
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for any i and for X,Y £ T X M, the tangent space to M at x € M, and use this to 
define the Lipschitz-Killing curvatures, Cj, j = 0, . . . , N on M. For example, if M 
is a manifold without boundary, then these are given by 

( "» C ' {M) - w-^N-my. L ^'<-«> , "-' ,/2 

when N — j > is even, and otherwise. Here Vol s is the volume form of the 
Riemannian manifold (M, g), R is the curvature tensor and Tr M the trace operator 
on the algebra of double forms on M . For simple Euclidean spaces, with various 
orderings and normalisations, the Lipschitz-Killing curvatures are also known as 
Quermassintegrales, Minkowski or Steiner functionals, integral curvatures, and in- 
trinsic volumes. Note that Cjy(M) = Vol 9 (M) is the Riemannian volume of M and 
Cq(M) = x(M) is its Euler characteristic. 

The Gaussian kinematic formula (hereafter GKF) was due originally to Taylor 
in [35] (but for the form below see [2, 36]) and states that 



(4.5) E{A(Mn/- 1 p))} = J2 

3=0 



N- 

j 



(2^/ 2 C i+j (M)M](D). 



The combinatorial coefficients here are the standard 'flag coefficients' of integral 
geometry, given by 



n 







J_ 







UJ. n 



where oj n is the volume of the unit ball in K™. The A^J(D), known as the Gaussian 
Minkowski functionals of D, are determined via the tube expansion 



OO j 

(4.6) P{f(x)e{y: d(y,D)<p}} - £%MJ(D), 

3=0 3 ' 

where x is any point in M and d is the usual Euclidean distance from a point to a 
set. 

One could devote a book to this formula and, indeed, such a book exists. So we 
shall refer you to [2] for all needed technical details. 

We note only one pertinent fact, for immediate use. Taking j = in (4.5) gives the 
expected Euler characteristic of excursion sets as a simple, closed form expression 
that can be readily calculated in many interesting cases. Again, see [2] for details. 



4-4- The Euler integral of a Gaussian random field 

Returning to the signal plus noise problem of Section 4.2, we can formulate the first 
step towards its solution. 

Let M be a nice, tame, space. (The definition of 'tame' can be found in [2].) Let 
/ be a random field. Here is a striking result, due to Bobrowski and Borman [6]: 

Theorem 4.2. Let M be an N -dimensional tame stratified space and let f : M — > 
M. k be a k-dimensional Gaussian random field satisfying the GKF conditions. Let 
G : M. k — > K be piecewise C 2 and let g = G o f. Setting D u = G _1 (— oo,u] and 
assuming that \J K A4j(D u )du\ < oo, we have 

(4.7) E ( f gd x )= x(M)E{g} - T (2rr) ~ j/2 £j (M) / M](D u )du, 
Um ) j=1 Jm 
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where E{g} := K{g(t)}. (g(t) has constant mean). 

While, on the one hand, this is not a difficult result to prove, given the GKF and 
(4.2), it was completely unforseen until discovered and has a number of interesting 
and potentially deep implications. 

The main difficulty in applying Theorem 4.2 lies in computing the Minkowski 
functionals Ai~j(D u ). A simple example is given in the following case: 

Theorem 4.3. Let M be an N -dimensional tame stratified space and let f : M — > K 
be a real valued Gaussian random field satisfying the GKF conditions. Let G : K — > 
K be piecewise C 2 and let g = G o /. Then 



E 



/ gd X ) = X(M)E {g} + 



( S gn(G'))^G') 



(2tt)j/ 2 



In the theorem, for n > the nth Hermite polynomial is defined by 

d n 

H n (x) = (-iy^~\ x )— v ( x ), 

where ip is the standard Gaussian density, e~ x / 2 /\/2tt. The inner product is given 

by 



(4.8) (f,g) = / f(x)g(x)<p(x)dx, 

and we use the convention, required below, that 

/>oo 

H_i(x) = iy9 _1 (a;) / ip(u)du. 



In the case that the function G is strictly monotone, we have an even simpler 
form. 

Corollary 4.4. Let f be as in Theorem 4-3 and G be a strictly increasing function, 
then 

{Hj.G} 



«{/„ 9 «x}=g(-D^(M)ffl 
// G is strictly decreasing then, 

' j M 9d X )-j:c 3 (M) {H ^ G) 



Finally, taking G to be the identity function yields 
Corollary 4.5. Let f be as in Theorem 4-3, then 

•{/.'*}~^ 

Further details and further examples, including results for \ 2 an d F random 
fields, can be found in [6]. 
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4-5. Persistent homology of Gaussian excursion sets 

We now return to excursion sets, which would seem to have been forgotten in all the 
discussion on Euler integrals. It turns out that this was not the case, but that Euler 
integrals and Theorem 4.3 contain a lot of information on the persistent homology 
of Gaussian excursion sets. 

First, however, some notation and a definition: Suppose we have a barcode, which 
we shall denote by B. Denote the individual bars in B by b, their lengths by £{b), 
and the degree of the homology group to which belongs the generator that they 
represent by u(b). 

Definition 4.6. The Euler characteristic of a barcode B with no bars of infinite 
length is 

X(B) ± £(-1)^(6). 

b£B 

It turns out that this topological quantity can actually be written in terms of 
Euler integrals. For convenience, and adopting the prejudices of topologists rather 
than probabilists and statisticians, we shall consider the barcodes of incursion rather 
than excursion sets, or, equivalently, sub-level sets rather than super-level sets. That 
is, we replace excursion sets of (2.2) by 

(4.9) A u ^{pe M : f(p) € (-oo,u}} = /^((-oo.u]). 
Then [6] showed that, if B(f,u) denotes the barcode of A u for tame /, 

(4.10) X {B(f, /max)) = fn^X(M) ~ [ f d X . 

Combining this with Theorem 4.2 yields 

Theorem 4.7. Let f ; M — > R k be a Gaussian random field satisfying the GKF 
conditions, G € C 2 (M. k ,M) and g = Go f. Then 

N 

®{x(B(g,g max ))} = X(M) (E{ ffmax } -E{g}) + V (2n)~^ 2 C j (M) / M 3 (D u )du. 
If f is real, then 

N 

E{ X (B(f, a))} = X (M) (<p(a) + a*(o)) + <p(a) ]T (27r)-^ 2 £ j (M)H j _ 2 (-a), 
for any a. 

The way we have presented this result, as a 'natural' consequence of a reasonably 
'straightforward' Theorem 4.2, substantially underplays its importance and novelty. 
It has a number of interesting corollaries, for which we send you to the original 
paper [6]. But its main contribution lies in its very existence, connecting, as it does, 
between probabilistic objects and their homological structure. 

As one of our colleagues/teachers recently stated: "I can think of no two top- 
ics in mathematics further away from one another than probability and algebraic 
topology. There is probably no way to connect them." Yet here, in Theorem 4.7, is 
an elegant connection, one of the first of its kind. 
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5. Random Geometric Complexes 

In Sections 2.2 and 2.3 we motivated the idea of persistent homology and barcodes 
with examples from manifold learning and random simplicial complexes. Despite 
this, we shall not go into detail, but shall rather describe some general issues and 
give you a few useful references to this area, which is also currently undergoing 
rapid development. 

5.1. Manifold learning 

We already mentioned manifold learning briefly in Section 2.2 via the example of 
trying to identify an annulus, or at least its homology, from a simplicial complex 
built over the sample points. The subject of manifold learning goes, obviously, well 
beyond such an example, and examples of algorithms for 'estimating' an underlying 
manifold from a finite sample abound in the statistics and computer science liter- 
atures. Very few of them, however, take an algebraic point of view, which is what 
we have stressed in this paper. 

One contribution in the spirit of this paper is [30] by Niyogi, Smale, and Wein- 
berger, who studied the problem of estimating smooth manifolds from finite sam- 
ples. They showed that in sampling from a high dimensional manifold, if the sam- 
pling is dense enough then, with high probability, the set (2.3) deformation retracts 
to the manifold and so has the same homology. This implies that long persistence 
intervals, once one has enough sample points, are very likely to correctly compute 
the homology of a submanifold. 

Of course, one of the most important issues in dealing with data is noise. In 
the setting of manifold learning this translates to the sample points possibly not 
coming from the submanifold that theoretically models the phenomenon because 
of experimental, measurement, or other error. In [31] the same authors treat this 
issue, as does [14] from a different and enlightening point of view. 

In a complementary but related direction [8, 9] apply persistence techniques to 
the nonparametric study of functions on a given manifold. 

5.2. Random complexes 

We now return to the Cech and Rips complexes of Section 2.3. 

To get a feeling for the phenomena that occur as we approximate a manifold by 
the union of balls, it is perhaps enlightening to consider the situation, for a fixed e, 
of the evolution of the homology of the Cech complexes as the number of points, n, 
grows. The points themselves we assume are chosen uniformly, at random, on the 
manifold. 

For n small, the balls do not intersect, but as n grows intersections begin to occur 
and small finite graphs appear. Assuming that e is sufficiently small that all e-balls 
have about the same volume, it is easy to compute the expected number of times a 
particular graph arises. This leads to complicated integrals, but investigating them 
leads to the belief that /c-homology (for a Cech complex) is most likely to occur 
as a result of the occurrence of boundaries of (k + l)-simplices. That is, it requires 
k + 1 points to be close to each other (at scale e) but not to fill in. Aside from a 
constant factor, the probability that this happens should be (e rf ) fe ( fe " 1 )- For this 
probability to be non-trivial, one requires that n is 

0(l/£ dfc /(fc+i)). i n other words, 
it is for n of this size that one begins to see interesting /c-homology. 
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As n continues to grow, there will be a point where the data covers a nontrivial 
percentage of the volume, a point at which phenomena related to percolation occur. 
Finally, there is a reversal when the e-balls fill almost all of the manifold, all extra 
homology dies, and ultimately we obtain the correct calculation of homology. 

To get a feeling for the end game, it is worthwhile to compute the expected 
Euler characteristic of the union of n e-balls in, say a flat torus. For simplicity, as 
we are only giving a heuristic, let's avoid the complications of Euclidean metrics 
and consider an L°° metric on the torus. In that case, a straightforward inclusion 
exclusion argument (see [32] for this in a Poisson model in Euclidean space and 
[29] for the use of kinematic formulae to obtain the relevant formulae in the case 
of genuine round balls), together with a generating function argument, give the 
formula for E(x„ j£ ( j), where Xn,e,d is the Euler characteristic of the union of n 
e-balls and d is the dimension of the torus, as follows: 



One thus obtains that the Euler characteristic is approximately (for the last time 
and so implying coverage) when n is around (1/r) log(l/r) plus lower order terms. 
See [21] for more details. Interestingly, it follows from the work of [33] that the 
phase transition for the giant component to form, in the sense of random graph 
theory, is (asymptotically) at 2~ d times this number. That is, the computation of 
components seems to be correct. It also seems extremely likely that there are phase 
transitions at other multiples of this fundamental scale where the other homology 
groups are correctly computed. 

Many more details of the phenomena for small n and the percolation range, the 
relevant central limit theorems for homology and some valuable information about 
persistence intervals in Rips and Cech situations can be found in recent papers of 
Kahle [28] and Bobrowski and Borman [7]. They combine probabilistic tools with 
Morse theory to give rigorous proofs of these phenomena. We mention also an early 
lecture by Diaconis, available on the web [19], which suggests the general outline of 
this picture, and a forthcoming paper [5] that also deals with some aspects of this 
problem in a metric-measure setting (relevant to situations where the distribution 
of points in nature does not follow the Riemannian uniform measure). 

6. Technical appendix 

As promised, we shall now be a little more formal and explain what persistent 
homology really is. For this, however, we shall need to assume that the reader has 
a basic working familiarity with the theory of simplicial homology. We shall also, 
for simplicity, take all homologies over the group Z2 . This is the material of Section 



The second subsection of the appendix explains how to carry these definitions 
over to random complexes. To complete the Appendix we should have really added 
a section on how one turns the excursion sets of a continuous random field into a 
random filtered complex. This, as you will be able to guess, after reading the first 
two subsections, is done by discretizing the parameter space of the random field 
and then thresholding the field at various levels in order to obtain the simplices of 
the filtered complex. You can find details of this in the report [34] . 



Let r = (2e) 



d . Then 




6.1. 
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6.1. Persistent Homology 

We start by considering growing sequences of simplicial complexes that grow in the 
following manner. 

Definition 6.1. A filtered simplicial complex is a sequence of simplicial complexes, 
IC = {Kj}- >a , such that 

C n (K ) c C n (K x ) c C n (K 2 ) c C n (K 3 ) c • • • , 

for all n > 0, where C n (K) is the collection of all n-simplices in the complex K . 

We say that a simplex a enters the filtered complex K. at the entrance time i 
if a € K{ and a ^ Kj for all j < i. Occasionally, we shall use the term filtration 
instead of filtered complex. 

The usual computation of the homology groups of the complexes Kj is done 
for each j at a time, and so does not allow for comparison of homologies between 
complexes. The idea of persistent homology is to take the filtration into account 
and so be able to describe how homological properties persist or disappear as k 
grows. 

Denoting the n-cycles of a complex K by Z n (K) and the n-boundaries by B n (K) , 
note that any cycle in Z n (Kj) also belongs to Z n (Kj+i) and boundaries in B n {Kf) 
belong to B n (Kj + i). This allows us to define the linear maps 

ii nJ) :H n {K 3 ) — > H n (K j+1 ), 
z = z + B n (K j ) i— > z = z + B n (K j+1 ). 

Definition 6.2. The p-persistent n-th homology group of Kj is defined by 

HI (Kj) = il (H n (Kj)) c H n (K j+P ) , 

where i* denotes the composition 

^nj+p-l) Qi (n,j+p-2) ... oi (n,j)_ 

The non-zero elements of the persistent homology group are the images of n- 
cycles which exist at time j (i.e. belong to Kj) and which 'survive' until time j +p, 
in the sense that they are not nullified by becoming a boundary. 

We now restrict the discussion to filtered complexes of finite type. 

Definition 6.3. We say that a filtered simplicial complex K. = {Kj}^ >0 is of finite 

type if for all j > and n > 0, C n (Kj) is finite and if there exists an index i such 
that Kj = Ki for all j > i. 

Now fix n > 0. Recall that we are working with so that the H n (Kj) are 
all vector spaces. Using algebraic arguments 1 it can be shown that for any filtered 
complex of a finite type it is possible to choose bases {c\, c^, . . . , c? m . }, of H n (Kj), 

for all j > 0, such that for any 1 < k < m,j, i*(<4) € {0, cj +1 , c4 +1 , . . . , c^ +i } and 

i,(cj) = «*(<£) for k ^ k' only if i»(4) = 0. 

1 For details see [39]. 

imsart-coll ver. 2009/08/13 file: larry.tex date: March 29, 2010 



Persistent Homology for Random Fields and Complexes 



17 



Figure 6.1 shows an example of this relation between the bases for a certain 
filtered complex. The elements below each of the homologies form a basis of the 
homology. Note that we have written = H n (Kk) in order to save space. 



Ho 
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H 2 
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#5 #6 
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i* 
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c| R €3 R 

cl R 

4 b ••• 

Fig 6.1 Bases across the filtration. 

For any basis element c[ which is not an image of a previous basis element cjT 1 , 
either there is a minimal number p > 1 such that iS(ci) = or ii(cL) for any 
p > 1. Each such element can be matched to the interval (j, j or, respectively, 
(j,oo). These intervals give rise to the graphical presentation in the form of a 
barcode. The horizontal axis of the barcode scheme represents time - i.e. the index 
of the complex - and each bar spanning the interval from j to i corresponds to an 
interval (j, i) linked to a basis element as above. 



— ► 

i i i i i i ^^^^^^^r ^ 

1 2 3 4 5 6 7 " J 

Fig 6.2. Barcode representation of homology bases. 

Thus, for the bases of Figure 6.1, assuming that Kj = Kq for all j > 6, the 
barcode is as in Figure 6.2. 

Note that the Betti numbers of each of the complexes in the filtration can be 
easily derived from its barcode: f3 n {Kj) is the number of bars which intersect a 
vertical line at time j, excluding those ending at time j. 

Recall that we considered a single fixed dimension of the homology, n. The 
collections of bars of persistent homologies of all dimensions is called the barcode 
of the filtered complex (of finite type) K, = {Kj}j>Q. 

6.2. Random filtered complexes and entrance time fields 

Random filtered complexes are the link between the 'deterministic' notion of per- 
sistent homology and the random setting. 

Assuming the ubiquitous probability space (fi, J 7 , P), a random filtered complex 
will be defined as a mapping from f2 to some space F of filtered complexes. However, 
allowing F to be too general makes it virtually impossible to define a meaningful 
cr-algebra on it, and so we shall restrict our discussion here to cases in which the 
elements of F are all sub-complexes of finite universal complex. Among other things, 
this implies that they are of finite type. 
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We begin with a trivial generalization of the definition of a filtered simplicial 
complex. We now allow the set of indices of a filtered complex to be any well- 
ordered infinite set in R U {— oo}, so that a filtered complex is now of the form 
K, = {K a } ae A. The definitions which followed the definition of filtered complexes 
can be easily adjusted accordingly. The motivation for this is that, while in the 
deterministic case one deals with a fixed filtered complex, with a fixed set of indices, 
in the random case one needs to assign indices meaningful to a wider possible set 
of outcomes, and the natural numbers no longer suffice as an index set. 

In addition, for the discussion of random filtered complexes, we think of finite 
type complexes as having only a finite number of indices by discarding the con- 
stant tail of complexes. Formally, we can think about it as restricting ourselves to 
discussing only complexes with tail defined in a canonical way: for all nitrations, 
{Ka}aeA, there exists «o € A such that for all ao < a G A, K ao = K a and such 
that AC\ [ao, oo) — {ao + i}^ . To save some tedious notation we simply work with 
a finite portion of the complex. 

Next, some notation. For a given simplex a in a filtered complex, we define its 
entrance time 

ent = ent(cr) = min{a : a £ K a } , 

if the minimum is finite and oo otherwise. For a simplicial complex K, let $(K) 
denote all finite type nitrations, {K a } a€ A, of K satisfying the condition that, for 
any a G A, there exists a simplex a G K with entrance time ent(tx) = a. This con- 
dition basically says that we consider nitrations with no 'spare' complexes, which, 
loosely speaking, contain no additional information. 
Note that we then have the natural injective mapping 

n ^ n K :Z(K)^Y[R a = R Card(K) , {K a } aeA ^ {ent (a)} aeK . 

Using the mapping 7r, $(K) can be endowed with the structure of a measurable 
space, determined by the rule: F c 3(K) is measurable if and only if F = tt^ 1 (B) 
for some Borel set B in rio-si? ^ff' with respect to the standard product topology 
on riffe/f^- We denote this cr-algebra by ^(K). 

Note that 03 (i^) is the Borel cr-algebra on 3(K), when endowing it with the 
topology defined by the rule: F C $(K) is open if and only if F = tt~ 1 (G) for some 
open set G in Ilo-eA'^- 

We are now finally ready to define the random filtration of a complex. 

Definition 6.4. Let K be a fixed universal complex. A random filtration of K is a 
measurable function K, : (Cl,J-) — > ($(K), Q3(i4")). 

The following lemma, in which R denotes the two point compactification of R, 
is now straightforward. 

Lemma 6.5. For a finite complex K, the mapping K : (i^,^ 7 ) — > ($(K), Q5(if )) 
is measurable if and only if ent/cf u ){ cr ) '■ (^i-^ 7 ) ~* (R, *B(R)) is measurable for all 
a G K (where *8(R) is the Borel a -algebra on R^. 

Lemma 6.5 implies that iroJC is a random field on K. The next result shows that 
under certain compatibility conditions on a field on K, the converse is also true. 

Definition 6.6. Let E — {E a } aeK be a random field on a finite simplicial complex 
K . If E a (uS) < E T [lo) for any simplices c, r G K for which a C r , we say that E 
is an entrance time field on K . 
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Corollary 6.7. If E = {E a } a€K is an entrance time field on a finite simplicial 
complex K , then ir^ 1 (E) is a random filtration of K . Moreover, 7Tr- gives a 1-1 
correspondence between random filtrations and entrance time fields on K . 

Note that even when a field E on K does not satisfy the condition of Definition 
6.6 we can define an entrance time field by E — m&x{E T : t C er}. 
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